Multiple imputation: an alternative to top coding for statistical disclosure control
نویسندگان
چکیده
Top coding of extreme values of variables like income is a common method of statistical disclosure control, but it creates problems for the data analyst. The paper proposes two alternative methods to top coding for statistical disclosure control that are based on multiple imputation. We show in simulation studies that the multiple-imputation methods provide better inferences of the publicly released data than top coding, using straightforward multiple-imputation methods of analysis, while maintaining good statistical disclosure control properties. We illustrate the methods on data from the 1995 Chinese household income project.
منابع مشابه
A multiple imputation approach to disclosure limitation for high-age individuals in longitudinal studies.
Disclosure limitation is an important consideration in the release of public use data sets. It is particularly challenging for longitudinal data sets, since information about an individual accumulates with repeated measures over time. Research on disclosure limitation methods for longitudinal data has been very limited. We consider here problems created by high ages in cohort studies. Because o...
متن کاملUsing Multiple Imputation Technique to Correct for Measurement Error and Statistical Disclosure Control in Sensitive Count Data in a National Survey
Measurement error in sensitive question is pervasive, therefore, biasing the estimation of most statistical models. The objective of this paper is to correct for measurement error in the number of life-time sexual partners by treating it as a missing data problem and using multiple imputation technique to synthesize this underlying true attribute. Bayesian Poisson model with diffuse Gaussian ...
متن کاملMultiple Imputation for Disclosure Limitation: Future Research Challenges
Statistical agencies that disseminate data to the public are ethically and often legally required to protect the confidentiality of respondents’ identities and sensitive attributes. To satisfy these requirements, Rubin (1993), Little (1993), and Fienberg (1994) proposed that agencies utilize multiple imputation. For example, agencies can release the units originally surveyed with some values, s...
متن کاملCombining synthetic data with subsampling to create public use microdata files for large scale surveys
To create public use files from large scale surveys, statistical agencies sometimes release random subsamples of the original records. Random subsampling reduces file sizes for secondary data analysts and reduces risks of unintended disclosures of survey participants’ confidential information. However, subsampling does not eliminate risks, so that alteration of the data is needed before dissemi...
متن کاملDisclosure Control in Business Data - Experiences with Multiply Imputed Synthetic Datasets for the German IAB Establishment Survey
Generating synthetic datasets based on the ideas of multiple imputation is an innovative method for statistical disclosure control. The basic idea is to replace the values for some confidential variables X with several draws from the posterior predictive distribution of X given some non confidential variables Y. Since the synthetic values are based on models for the joint distribution of the da...
متن کامل